[fix](user_var)fix integer typing and prefer Variable.realExpression for argument/type resolution#62524
[fix](user_var)fix integer typing and prefer Variable.realExpression for argument/type resolution#62524starocean999 wants to merge 5 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
Blocking findings:
fe/fe-core/src/main/java/org/apache/doris/qe/ConnectContext.java:getLiteralForUserVar()still mishandlesLARGEINT.SetUserDefinedVarOp.run()stores user variables viavalue.toLegacyLiteral(), and a NereidsLargeIntLiteralbecomesorg.apache.doris.analysis.LargeIntLiteral, notIntLiteral. The newcase LARGEINTunderliteralExpr instanceof IntLiteralis therefore unreachable. A real user variable likeSET @v = 9223372036854775808still falls through to the finalelseand is converted to a string literal, so the advertised integer-typing fix is incomplete.fe/fe-core/src/test/java/org/apache/doris/nereids/rules/analysis/UserVariableAnalysisTest.java:testUserVarIntegerTypeBig()only coversBIGINT(Long.MAX_VALUE), so the current test suite does not prove theLARGEINTcase above.
Critical checkpoint conclusions:
- Goal of the task: Partially achieved. The early variable-replacement/type-coercion path looks fixed and has unit/regression coverage, but the integer-typing fix is incomplete because
LARGEINTuser variables are still converted incorrectly. - Small/clear/focused: Yes. The PR is narrowly scoped to FE user-variable analysis and tests.
- Concurrency: Not applicable in the touched paths. No new shared-state or locking changes were introduced.
- Lifecycle/static initialization: Not applicable.
- Configuration changes: None.
- Incompatible protocol/storage changes: None.
- Parallel code paths: Checked the analysis, constant-folding, and SQL-cache variable paths. No other missing mirror path was found beyond the remaining
LARGEINTlegacy-literal conversion gap. - Special conditional checks: No blocking issue here.
- Test coverage: Insufficient for the
LARGEINTpath; the new tests only proveINT/BIGINTtyping and the early replacement fix. - Modified test results: The added regression output is plausible for the timing/coercion fix. No obvious
.outmismatch was found from inspection. - Observability: Not needed for this change.
- Transaction/persistence: Not applicable.
- Data writes/modifications: Not applicable.
- FE-BE transmitted variables: Not applicable.
- Performance: No material concern in the touched code.
- Other issues: None beyond the blocking correctness gap and its missing test coverage.
| import java.math.BigInteger; | ||
| import java.nio.ByteBuffer; | ||
| import java.util.ArrayList; | ||
| import java.util.Collections; |
There was a problem hiding this comment.
Blocking: this LARGEINT arm is unreachable in the actual user-variable flow. SetUserDefinedVarOp.run() stores values via value.toLegacyLiteral(), and a Nereids LargeIntLiteral becomes org.apache.doris.analysis.LargeIntLiteral, not IntLiteral. That means SET @v = 9223372036854775808 still falls through to the final else here and is converted to a string literal instead of a Nereids LargeIntLiteral. Please handle legacy LargeIntLiteral explicitly (or reuse Literal.fromLegacyLiteral(literalExpr, literalExpr.getType()) for the numeric branches).`
| } | ||
|
|
||
| @Test | ||
| public void testUserVarIntegerTypeBig() { |
There was a problem hiding this comment.
Long.MAX_VALUE is still a legacy IntLiteral with BIGINT type, so this test never exercises the newly added LARGEINT path. Please add a case using a real legacy org.apache.doris.analysis.LargeIntLiteral (for example "9223372036854775808"), otherwise the remaining bug in getLiteralForUserVar() will not be caught.
|
/review |
There was a problem hiding this comment.
One blocking regression found.
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/ExpressionAnalyzer.java:visitUnboundVariable()now returnsvariable.getRealExpression()during analysis, so analyzed trees no longer containVariablenodes at all. That breaks at least one existing semantic check: generated-column validation explicitly rejects variables only after analysis inCreateTableInfo.checkExpressionInGeneratedColumn()(fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/CreateTableInfo.java:1190-1195,1256-1261). With this patch, a generated column likec AS (@x + 1)is analyzed into a literal expression before that check runs, so the prohibition is silently bypassed. This is a real behavioral regression, not just a missed test, because the old flow keptVariableuntil the laterVariableToLiteralrule.
Critical checkpoints:
- Goal / correctness: Partially accomplished. The PR addresses user-variable typing and early type coercion, but it also introduces a generated-column correctness regression, so the overall goal is not safely met. Existing tests do not cover this path.
- Scope / minimality: Not minimal enough. The fix changes the general analysis-time semantics of
UnboundVariable, which affects unrelated callers beyond the reported bug. - Concurrency: No meaningful new concurrency concerns in the touched code.
- Lifecycle: No special lifecycle/static-init issues identified.
- Config: No config changes.
- Compatibility: No FE/BE protocol or storage compatibility changes identified.
- Parallel paths: Yes, there was already a parallel
VariableToLiteralpath; changing the earlier analysis path without revalidating all downstreamVariableconsumers caused the regression above. - Special conditions/comments: The added defensive null-check is harmless, but the behavior change needs a stronger invariant explanation because it affects analysis semantics globally.
- Test coverage: Insufficient. Added FE/regression tests cover the motivating bug, but there is no negative test for contexts that must still reject variables after analysis, such as generated columns.
- Test results: New regression output looks consistent for the added query.
- Observability: No new observability needs for this small FE logic change.
- Transaction/persistence: Not applicable.
- Data writes/modifications: The generated-column regression can persist invalid metadata expressions, so this area is affected indirectly and needs correction before merge.
- FE/BE variable passing: Not applicable.
- Performance: No meaningful performance concern from the intended fix itself.
- Other issues: None blocking beyond the semantic regression above.
Please keep Variable nodes available until all semantic validations that rely on them have run, or move/duplicate those validations to the parsed-expression phase before replacing variables with literals, and add a regression/unit test for generated-column rejection.
| if (sqlCacheContext.isPresent()) { | ||
| sqlCacheContext.get().addUsedVariable(variable); | ||
| } | ||
| } |
There was a problem hiding this comment.
This changes UnboundVariable analysis from producing a Variable node to producing its literal value immediately. That fixes the early type-coercion problem, but it also removes Variable from every analyzed tree, which breaks downstream semantic checks that intentionally inspect analyzed expressions for Variable. One concrete case is generated columns: CreateTableInfo analyzes the expression first and only then rejects instanceof Variable (checkExpressionInGeneratedColumn()), so after this change a generated column like c AS (@x + 1) can slip through as a plain literal expression instead of being rejected. The old flow kept the Variable until the later VariableToLiteral rule, so that check still worked. Please either preserve Variable until those validations finish, or move the generated-column variable rejection earlier and add coverage for it.
…g issue in expression analysis
|
/review |
97915f8 to
5738026
Compare
There was a problem hiding this comment.
Reviewed commit 97915f88ece394e9b2e95be119015e1c2a3c6fad.
Blocking issues found.
Critical checkpoint conclusions:
- Goal / correctness: Partially met. The BIGINT/LARGEINT
window_funnelcase is covered, but the PR still leaves small-width user-variable typing broken and introduces an analyzer regression by erasingVariablenodes too early. - Small / focused change: Not fully. The SQL-cache bookkeeping change in
ExpressionAnalyzeralso changes the analyzed tree shape, which is broader than needed. - Concurrency: No concurrency concerns in the touched code.
- Lifecycle / initialization: No special lifecycle or static-init issues.
- Config: No new configuration.
- Compatibility: No FE-BE/protocol/storage format change, but generated-column metadata correctness regresses because forbidden variables can now be folded into persisted expressions.
- Parallel paths:
VariableToLiteralalready performs variable-to-literal replacement later in analysis; bypassing it here skips later checks that still expectVariablenodes. - Conditional checks: The added defensive null-check is harmless, but it is attached to the broader semantic regression above.
- Test coverage: Insufficient for the changed behavior. The added tests cover the
WINDOW_FUNNELbigint case and SQL-cache recording, but they miss a negative generated-column case and an exact small-int case such asCURTIME(@p). - Test result updates: The updated regression output looks consistent with the covered case.
- Observability: No additional observability is needed.
- Transaction / persistence / data writes: A generated column created from a variable expression would persist the wrong analyzed expression, so this is a metadata correctness issue.
- Performance: No material concerns.
Please address the inline comments before approving.
| sqlCacheContext.get().addUsedVariable(variable); | ||
| } | ||
| } | ||
| return variable.getRealExpression(); |
There was a problem hiding this comment.
Returning the literal here changes the analyzed tree too early. VariableToLiteral already replaces variables later in analysis, but callers like CreateTableInfo.generatedColumnCheck() analyze first and only then reject expr instanceof Variable (fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/CreateTableInfo.java:1187-1261). With this change, a generated-column expression containing @x is accepted and the current session value gets baked into the persisted expression instead of being rejected.
| return variable.getRealExpression(); | |
| return variable; |
| import java.math.BigInteger; | ||
| import java.nio.ByteBuffer; | ||
| import java.util.ArrayList; | ||
| import java.util.Collections; |
There was a problem hiding this comment.
This still widens TINYINT/SMALLINT user variables to INTEGER, so exact-width signatures remain broken. SET @p = 3 stores a TINYINT via LogicalPlanBuilder.visitIntegerLiteral() and TinyIntLiteral.toLegacyLiteral(), but this branch still returns Literal.of((int) ...), so SELECT CURTIME(@p) is analyzed as current_time(INT) and fails against the existing current_time(TINYINT) signature in CurrentTime. If the goal is to preserve the original integer type, the TINYINT and SMALLINT cases need explicit byte/short handling too.
|
/review |
There was a problem hiding this comment.
Found blocking issues in both the implementation and the newly added FE unit tests.
- The Variable->literal timing fix is still incomplete.
ExpressionTraitnow unwrapsVariablefor callers that go throughgetArguments*(), butBindExpressionstill binds functions before the laterVariableToLiteralanalyzer pass runs. Functions that validate rawchild()nodes still hit the old timing bug. A concrete reachable example isSET @p = 3; SELECT utc_time(@p)(alsoutc_timestamp(@p)andchar(@cs, 65)):ExpressionAnalyzer.visitUnboundFunction()callsTypeCoercionUtils.processBoundFunction()immediately, and those implementations still checkchild(0).isLiteral()/instanceof StringLikeLiteral, so they reject theVariablewrapper before the new helper is ever consulted. UserVariableAnalysisTest.testUserVarIntegerTypeSmall()asserts the wrong type.new IntLiteral(1L)is a legacyTINYINT, and the newConnectContextlogic now preserves that width, so this assertion expects the pre-fix behavior and will fail once FE UTs are run.UserVariableAnalysisTest.testVariableRecordedAndTypeCoercion()no longer matches the implementation. The patch removed SQL-cache recording fromExpressionAnalyzer.visitUnboundVariable(); recording now happens inReplaceVariableByLiteral. This test only callsExpressionAnalyzer.analyze(...), sousedVariablesstays empty and the assertion will fail.
Critical checkpoints
- Goal of current task: Partially achieved. Integer width preservation and the
window_funnelregression path are covered, but the general Variable->literal timing issue is not fully fixed and the added FE UTs do not currently prove the intended behavior. - Modification size/focus: Mostly contained, but the cross-cutting
ExpressionTraitchange still leaves raw-child parallel paths untouched. - Concurrency: No concurrency, locking, or lifecycle concerns in the touched code.
- Configuration / compatibility: No new config, storage-format, or FE-BE compatibility concerns were introduced here.
- Parallel code paths: Not fully covered; functions that use raw
child()/isLiteral()checks bypass the new helpers. - Special conditional checks: The integer-width switch itself is straightforward, but the remaining typed/user-variable paths are still uneven.
- Test coverage: Regression coverage was added for
window_funnel, but the new FE unit tests currently contain invalid expectations and do not cover the remaining raw-child function paths. - Test result modifications:
regression-test/data/query_p0/set/test_user_var.outlooks consistent with the added single-row aggregate case. - Observability / transaction / persistence / data-write concerns: Not applicable in this PR.
Environment note: I could not execute FE UTs in this runner because thirdparty/installed/bin/protoc is missing, and run-fe-ut.sh stops during generated-source setup.
| return children(); | ||
| ImmutableList.Builder<Expression> arguments = ImmutableList.builder(); | ||
| for (Expression arg : children()) { | ||
| if (arg instanceof Variable && ((Variable) arg).getRealExpression() != null) { |
There was a problem hiding this comment.
This only fixes callers that go through getArguments*(), but BindExpression still binds functions before the later VariableToLiteral analyzer pass runs. That means any function that validates raw child() nodes still hits the same timing bug. A concrete reachable example is SET @p = 3; SELECT utc_time(@p) (also utc_timestamp(@p) and char(@cs, 65)): ExpressionAnalyzer.visitUnboundFunction() immediately calls TypeCoercionUtils.processBoundFunction(), and those implementations still check child(0).isLiteral() / instanceof StringLikeLiteral, so they reject the Variable wrapper before this helper is ever consulted. So this fixes helper-based callers like window_funnel, but not the general Variable->literal timing issue described in the PR.
| // get nereids literal via ConnectContext helper | ||
| Literal l = ConnectContext.get().getLiteralForUserVar("a"); | ||
| Assertions.assertNotNull(l); | ||
| Assertions.assertEquals(IntegerType.INSTANCE, l.getDataType()); |
There was a problem hiding this comment.
new IntLiteral(1L) is a legacy TINYINT, not an INT (IntLiteral.init() picks the smallest fitting width). With the new ConnectContext logic this now becomes a TinyIntLiteral, so this assertion is expecting the pre-fix behavior and will fail once FE UTs are run. If the intent is to verify INT, construct the legacy literal with Type.INT; otherwise assert TinyIntType.
| Assertions.assertNotNull(analyzed); | ||
|
|
||
| // sql cache context should have recorded the used variable | ||
| List<Variable> used = cascadesContext.getStatementContext().getSqlCacheContext().get().getUsedVariables(); |
There was a problem hiding this comment.
This test no longer matches the final implementation. The patch removed SQL-cache recording from ExpressionAnalyzer.visitUnboundVariable() and only ReplaceVariableByLiteral calls sqlCacheContext.addUsedVariable(...). ExpressionAnalyzer.analyze(cmp, rewriteContext) never runs that rewrite, so usedVariables stays empty here. To test SQL-cache recording you need to run the analyzer job / VariableToLiteral pass, or explicitly rewrite analyzed with ReplaceVariableByLiteral before asserting.
|
/review |
There was a problem hiding this comment.
Findings:
ConnectContext.getLiteralForUserVar()now depends on the stored legacy literal keeping its original primitive type, but the forwarded FE path still drops that information.FEOpExecutor.getForwardUserVariables()serializes anIntLiteral(Type.BIGINT/INT/SMALLINT/...)asINT_LITERALwithmsg.typeset, whileConnectProcessor.getLiteralExprFromThrift()reconstructs everyINT_LITERALasnew IntLiteral(value)and ignoresnode.type. A query forwarded through an observer/follower can therefore still turnSET @v = CAST(1 AS BIGINT)into a TINYINT user variable before this new code runs, so the integer-typing bug is only fixed on the local FE path.- The new
ExpressionTraitbehavior only helps call sites that already go throughgetArgument*()/getArguments*(). A lot of builtins still inspect rawchild()nodes duringcheckLegalityBeforeTypeCoercion()/computeSignature(), and those checks run before the laterVariableToLiteralanalyzer rule. For example,SELECT UTC_TIMESTAMP(@p)still fails:visitUnboundVariable()producesVariable(realExpression=literal),visitUnboundFunction()buildsUtcTimestamp(Variable), thenTypeCoercionUtils.processBoundFunction()callsUtcTimestamp.checkLegalityBeforeTypeCoercion(), which rejects!child(0).isLiteral(). So the PR fixes thewindow_funnel-style helper path but leaves the same user-variable bug class in other functions.
Critical Checkpoints:
- Goal of the task: partially met. The PR fixes local user-variable integer typing and helper-based argument/type resolution, and the added tests prove the
window_funnelpath, but it does not cover forwarded FE execution or functions that still validate rawchild()nodes. - Scope/minimality:
ConnectContextis focused; theExpressionTraitchange is broad and changes semantics for every function helper even though the remaining failures still come from bind-time raw-child checks. - Concurrency: no new concurrency-sensitive logic here.
- Lifecycle/static initialization: not applicable.
- Configuration: none.
- Compatibility / parallel paths: not complete. The forwarded FE path for user variables and the raw-
child()function-validation path are parallel code paths that remain inconsistent with the new behavior. - Special conditional checks: the new integer-type switch is understandable, but it assumes upstream type preservation that is not true on the forwarded path.
- Test coverage: insufficient for the changed behavior. Local unit/regression coverage exists, but there is no coverage for FE forwarding or other builtin functions that still see
Variablewrappers directly. - Test result changes: the regression output looks correct for the covered case.
- Observability: no new observability needs.
- Transaction/persistence/data-write concerns: not applicable.
- FE-BE / FE-FE variable passing: incomplete; FE-to-FE forwarding still loses integer subtype information.
- Performance: no blocking issue found, but the global
ExpressionTraitrewrite adds work to frequently used helpers. - Other issues: none beyond the blockers above.
Overall opinion: request changes until the forwarded FE path preserves the original integer subtype and the remaining bind-time raw-child validation paths are handled consistently for user variables.
| } else if (literalExpr instanceof IntLiteral) { | ||
| } else if (literalExpr instanceof IntLiteral || literalExpr instanceof LargeIntLiteral) { | ||
| // the value in the IntLiteral should be int, but now is long in old planner literalExpr | ||
| // so type coercion to generate right new planner int Literal |
There was a problem hiding this comment.
This fix still misses the forwarded-FE path. getLiteralForUserVar() now relies on the stored legacy literal keeping its original primitive type, but FEOpExecutor.getForwardUserVariables() serializes all of TINYINT/SMALLINT/INT/BIGINT as INT_LITERAL, and ConnectProcessor.getLiteralExprFromThrift() reconstructs that as new IntLiteral(value) while ignoring node.type. So SET @v = CAST(1 AS BIGINT) executed through an observer/follower still degrades to a TINYINT before this method runs. ExprToThriftVisitor already sends msg.type, so the real fix needs to preserve that type when rebuilding the legacy IntLiteral; otherwise the integer-typing bug remains on forwarded queries.
| * getArguments. | ||
| */ | ||
| default List<Expression> getArguments() { | ||
| return children(); |
There was a problem hiding this comment.
Unwrapping Variable here only fixes callers that already use getArgument*() / getArguments*(). A lot of function legality/signature code still reads raw child() nodes before the later VariableToLiteral analyzer rule runs, so the same user-variable bug is still present there. Concrete example: SELECT UTC_TIMESTAMP(@p) with @p = 3 still fails, because visitUnboundVariable() returns Variable(realExpression=literal), visitUnboundFunction() builds UtcTimestamp(Variable), and then TypeCoercionUtils.processBoundFunction() immediately calls UtcTimestamp.checkLegalityBeforeTypeCoercion(), which rejects !child(0).isLiteral(). The same pattern exists in utc_time, uniform, width_bucket, etc. I think this needs to be fixed earlier in the bind/type-check path, or those raw-child checks need to be updated as well; otherwise this patch only fixes the window_funnel-style subset of functions.
PR fixes two bugs:
(1) user/session variable integer typing was wrong when converted to Nereids literals;
(2) Ensure ExpressionTrait resolves argument expressions and types from a Variable's real expression when present, and falls back to the Variable itself when not. This prevents generating incorrect function signatures using Variable placeholders.
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)